Parallel LDPC Decoder

ABSTRACT

An LDPC decoder that implements an iterative message-passing algorithm, where the improvement includes a pipeline architecture such that the decoder accumulates results for row operations during column operations, such that additional time and memory are not required to store results from the row operations beyond that required for the column operations.

FIELD

This patent application is a continuation of and claims all rights andpriority on prior pending U.S. patent application Ser. No. 11/565,670filed 2006.12.01. This invention relates to the field of integratedcircuit fabrication. More particularly, this invention relates to anefficient, parallel low-density parity-check (LDPC) decoder for aspecial class of parity check matrices that reduces the amount of memoryand time that are required for the necessary calculations.

BACKGROUND

LDPC code is typically a linear stream of data in a self-correctingformat that can be represented by an (m,n)-matrix with a relativelysmall, fixed number of ones (nonzero for arbitrary GF(q)) in each rowand column, where m is the number of check bits and n is the code lengthin bits.

The most famous algorithm for decoding LDPC codes is called theiterative message-passing algorithm. Each iteration of this algorithmconsists of two stages. In stage 1 (the row operations), the algorithmcomputes messages for all of the check nodes (the rows). In stage 2 (thecolumn operations), the algorithm computes messages for all of the bitnodes (the columns), and sends them back to the check nodes associatedwith the given bit nodes. There are many different implementations ofthis message-passing algorithm, but all of them use two-stageoperations. Further, in each of these implementations, the second stepstarts only after all of the messages for all of the rows have beencalculated.

As with all information processing operations, it is desirable for theprocedure to operate as quickly as possible, while consuming as fewresources as possible. Unfortunately, LDPC codes such as those describedabove typically require a relatively significant overhead in terms ofthe time and the memory required for them to operate.

What is needed is an LDPC code that operates in a more efficient manner,such as by reducing the amount of time or the amount of memory that isrequired by the operation.

SUMMARY

The above and other needs are met by an LDPC encoder that implements aniterative message-passing algorithm, where the improvement includes apipeline architecture such that the decoder accumulates results for rowoperations during column operations, such that additional time andmemory are not required to store results from the row operations beyondthat required for the column operations.

BRIEF DESCRIPTION OF THE DRAWINGS

Further advantages of the invention are apparent by reference to thedetailed description when considered in conjunction with the figures,which are not to scale so as to more clearly show the details, whereinlike reference numbers indicate like elements throughout the severalviews, and wherein:

FIG. 1 is a functional block diagram of an LDPC decoder according to anembodiment of the present invention.

FIG. 2 is an example of an LDPC encoding matrix according to anembodiment of the present invention.

DETAILED DESCRIPTION

The LDPC algorithm described herein accumulates results for the rowoperations during the column operations, so that additional time andmemory are not required to store the results from the row operationswhile the column operations are conducted. One embodiment of the methodaccording to the present invention is presented below for the purpose ofexample. The method is described in reference to a hardware embodimentof the invention, as given in FIG. 1.

Initialization Step

For each parity bit w and code bit v, calculate:

md _(—) m[v]=Pv(0)/Pv(1),

md_g[v][w]=md_m[v], and

md _(—) R[w]=md _(—) m[v]),

where Pv(0) and Pv(1) are the possibilities (from the Viterbi decoder)that bit v is equal to either 0 or 1, respectively. O(v) denotes the setof all parity bits w that include code bit v.

First Step of Iteration Process

Compute the following:

$\begin{matrix}{{S\lbrack v\rbrack} = {\left( {\prod\limits_{w \in {O{(v)}}}\frac{{md\_ R}\lbrack w\rbrack}{{{md\_ g}\lbrack v\rbrack}\lbrack w\rbrack}} \right) \cdot {{md\_ m}\lbrack v\rbrack}}} & (1) \\{{{{loc\_ item}\lbrack v\rbrack}\lbrack w\rbrack} = \frac{{md\_ R}\lbrack w\rbrack}{{{md\_ g}\lbrack v\rbrack}\lbrack w\rbrack}} & (2) \\{{{md\_ g}{{{\_ new}\lbrack v\rbrack}\lbrack w\rbrack}} = \frac{S\lbrack v\rbrack}{{{loc\_ item}\lbrack v\rbrack}\lbrack w\rbrack}} & (3) \\{{{{md\_ R}{{\_ new}\lbrack w\rbrack}} = {f^{- 1}\left( {\prod\limits_{v \in {O{(w)}}}{f\left( {{md\_ g}{{{\_ new}\lbrack v\rbrack}\lbrack w\rbrack}} \right)}} \right)}},} & (4)\end{matrix}$

where

${f(x)} = \frac{1 + x}{1 - x}$

(the Gallager function), O(w) is all of the code bits from the paritybit w, and O(v) is all of the parity bits w that include the code bit v.

Calculations (1) and (2) above are performed for v=i, and calculations(3) and (4) are performed for v=i−1. Then, calculations (1) and (2) areperformed for v=i+1, and calculations (3) and (4) are performed for v=i,and so on, through a pipeline architecture in the arithmetic unit,depicted in FIG. 1. When all of the code bits v have been processed, thevalues are assigned as given below,

md_g[v][w]=md_g_new[v][w]  (6)

md_R[w]=md_R_new[w]  (7)

for each message bit v and parity bit w. A single iteration as describedabove is performed. The so-called “hard decision” for each code bit v isperformed during this single iteration, where:

$\begin{matrix}{{{hard\_ decision}\lbrack v\rbrack} = {{0\mspace{14mu} {if}\mspace{14mu} {{sign}\left( {\prod\limits_{w \in {O{(v)}}}{{{loc\_ item}\lbrack v\rbrack}\lbrack w\rbrack}} \right)}} = 1}} & (8)\end{matrix}$

and

$\begin{matrix}{{{hard\_ decision}\lbrack v\rbrack} = {{1\mspace{14mu} {if}\mspace{14mu} {{sign}\left( {\prod\limits_{w \in {O{(v)}}}{{{loc\_ item}\lbrack v\rbrack}\lbrack w\rbrack}} \right)}} = {- 1}}} & (9)\end{matrix}$

Products for the formulas (8) and (9) were already calculated duringcalculation (1) for S[v]. Preferably, the calculations are performed inthe logarithmic domain, so all products are replaced by sums asimplemented in the arithmetic unit.

Parallel Architecture

One embodiment of the LDPC decoder as described herein includes acontroller, an input fifo (first-in, first-out buffer, from the Viterbidecoder), an output fifo (first-in, first-out buffer for the final harddecision, or to another process, such as a Reed-Solomon computation), apipeline, two interleavers, and t functional units of two types: Bitunits and Par units, all as depicted in FIG. 1. The Bit units calculatedata on bit nodes, and the Par units calculate data on check nodes.

Each Par unit preferably contains pipelined memories for storing valuesof md_R[w] and md_R_new[w]. Each Bit unit preferably contains pipelinedmemories for storing values of S[v], md_m[v], and loc_item. Each unit ispreferably pipelined, meaning that it can store data for a few differentnodes at the same time. In the embodiment depicted the arithmetic unitis separated for simplification and to show all the more relevantconnections. However, the present invention is applicable to a widevariety of arithmetic unit architectures that are capable of performingcalculations (1) through (9) above. Also, in the embodiment as depictedin FIG. 1, memories are embedded into the arithmetic unit, but in otherembodiments they could be separate from the arithmetic unit.

A special parity check is used for (m,n) matrices H for LDPC-codes,which parity check can be represented by a matrix (M,N) from permutation(r,r) cell H_(i,j), where m=M·r, n=N·r, and r(mod t)=0. An example ofthe matrix H is given in FIG. 2, where M=3, N=7, r=8, m=24, and n=56.The permutation matrix contains exactly one value of one in each sub rowand sub column. To reduce the number of operations per circuit gate,circulant permutation matrices are used in one embodiment, whichmatrices are determined by formula:

p(j)=p(0)+j(mod r)

where p(i) is the index of the column with a value of one in i^(th) row.For example, p(0)=2 for the upper left cell in FIG. 2 (where counting ofboth rows and columns starts with zero). Thus, we can use the initialindex p(0) of one in the first row to determine each circulantpermutation matrix. Similarly, the function c(j) returns the index ofrow with a value of one in the j^(th) column.

Groups of t columns from the matrix H are logically divided intostripes. Assume that we already have a value of md_g[v][w] for each pair(v,w), where wεO(v), and a value of mg_R[w] for each parity bit w.Starting from the first stripe, the following operations are performedin one embodiment. Calculate the addresses for the t memories thatcontain md_R for all of the check nodes, according to the followingformula:

address(w)=cell_index(H _(ij))(mod M)·(r/t)+c(v)/t  (10)

where c(v) is the row index of the value one in the column for v from Rdand cell_index(H_(ij))=i+j·M.

The value of md_R[w] for the t memories is input on the reverseinterleaver that computes the permutation, according to the function:

π(i)=c(i)(mod t) for given H _(i,j).  (11)

Then, all of the values of md_R[w] are input to the right-most Bit unitto produce the sum S[v]. The method then continues with the same stripein H_(i+1,j), H_(i+2,j), and so on.

For the second and subsequent stripes, we calculate the value loc_itemand accumulate the sum S[v] for the current bits as described above, andretain the previously computed values of S[v] and loc_item for the bitsfrom the previous stripe in the pipeline in the Bit unit. Then thevalues of S[v] and loc_item are retrieved from the pipeline andrearranged through the direct interleaver, which computes thepermutation τ according to the function:

τ(π(i))=i, where π,τεH _(i,j).  (12)

and then calculates the values md_g_new and md_R_new according toformulas (3) and (4) for both v and w from the pipeline. When all thestripes have been processed in this manner, the values of md_g_new andmd_R_new are used to replace the values of md_g and md_R as given inequations (6) and (7), and one cycle of the iteration is completed.

Block-Schema of Algorithm

-   1. Starting with a k^(th) stripe and a cell H_(i,j) with index s.-   2. Calculate AR_BIT[i].md_m=md_m[v[k_(t+i]] where i=)0, . . . , t−1.-   3. Calculate the addresses for w^(s)εO(v) for v from cell H_(i,j)    with index s according to formula (11).-   4. Calculate AR_BIT[i].md_R=AR_PAR[π^(s)(i)].md_R[w^(s)], where    π^(s) is the reverse permutation for the cell with index s.-   5. Calculate AR_BIT[i].md_g=md_g[v[k_(t+i)]][w^(s)].-   6. Calculate AR_BIT[i] item[v[k_(t+i)]][w^(s)] according to formula    (2).-   7. Calculate AR_BIT[i].S[v[k_(t+i)]] according to formula (1).-   8. Calculate    AR_PAR[i].loc_item[v[(k−1)_(t+i)]][w^(s−M)]=AR_BIT[τ^(s−M)[i]]].loc_item[v[(k−1)_(t+i)]][w^(s−M)]    and    AR_PAR[i].S[v[(k−1)_(t+i)]]=AR_BIT[τ^(s−M)[i]].S[v[(k−1)_(t+i)]],    where τ^(s−M) is the direct permutation for the cell with an index    of s−M.-   9. Calculate AR_BIT[i].md_g_new[v[(k−1)_(t+i)]][w^(s−M)] according    to formula (3).-   10. Calculate AR_BIT[i].md_R_new[w^(s−M)] according to formula (4).-   11. Go to the next cell, with an index of s+1.-   12. If s+1(modM)=0, then go to the next stripe (k+1).-   13. If all cells pass step 12 above, then assign the values as given    in equations (6) and (7), and start a new iteration for the 0^(th)    stripe and the 0^(th) cell.

The foregoing description of preferred embodiments for this inventionhas been presented for purposes of illustration and description. It isnot intended to be exhaustive or to limit the invention to the preciseform disclosed. Obvious modifications or variations are possible inlight of the above teachings. The embodiments are chosen and describedin an effort to provide the best illustrations of the principles of theinvention and its practical application, and to thereby enable one ofordinary skill in the art to utilize the invention in variousembodiments and with various modifications as are suited to theparticular use contemplated. All such modifications and variations arewithin the scope of the invention as determined by the appended claimswhen interpreted in accordance with the breadth to which they arefairly, legally, and equitably entitled.

1. In an LDPC decoder implementing an iterative message-passingalgorithm, the improvement comprising a pipeline architecture such thatthe decoder accumulates results for row operations during columnoperations, such that additional time and memory are not required tostore results from the row operations beyond that required for thecolumn operations.